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Abstract 

Secret sharing over the fast-fading MIMO wiretap channel is considered. A source and a destination 
try to share secret information over a fast-fading MIMO channel in the presence of an eavesdropper who 
also makes channel observations that are different from but correlated to those made by the destination. An 
interactive, authenticated public channel with unlimited capacity is available to the source and destination 
for the secret sharing process. This situation is a special case of the "channel model with wiretapper" 
considered by Ahlswede and Csiszar An extension of their result to continuous channel alphabets is 
employed to evaluate the key capacity of the fast-fading MIMO wiretap channel. The effects of spatial 
dimensionality provided by the use of multiple antennas at the source, destination, and eavesdropper are 
then investigated. 

I. Introduction 

The wiretap channel considered in the seminal paper [1] is the first example that demonstrates the 
possibility of secure communications at the physical layer. It is shown in [1] that a source can transmit 
a message at a positive (secrecy) rate to a destination in such a way that an eavesdropper only gathers 
information at a negligible rate, when the source-to-eavesdropper channeQ is a degraded version of 
the source-to-destination channel. A similar result for the Gaussian wiretap channel is provided in [2]. 
The work in [3] further removes the degraded wiretap channel restriction showing that positive secrecy 
capacity is possible if the destination channel is "more capable" ("less noisy" for a full extension of 
the rate region in [1]) than the eavesdropper's channel. Recently, there has been a flurry of interest in 
extending these early results to more sophisticated channel models, including fading wiretap channels, 
multi-input multi-output (MIMO) wiretap channels, multiple-access wiretap channels, broadcast wiretap 
channels, relay wiretap channels, etc. We do not attempt to provide a comprehensive summary of all 

Tan F. Wong and John M. Shea are with the Wireless Information Networking Group, University of Florida, Gainesvilles, 
Florida, 32611-6130, USA. Matthieu Bloch is with the School of Electrical and Computer Engineering, Georgia Institute of 
Technology, Atlanta, GA, and with the GT-CNRS UMI 2958, 2-3 Rue Marconi, 57070, Metz, France 

'The source-to-eavesdropper and source-to-destination channels will hereafter be referred to as eavesdropper and destination 
channels, respectively. 



2 



recent developments, and highlight only results that are most relevant to the present work. We refer 
interested readers to the introduction and reference hst of [4] for a concise and extensive overview of 
recent works. 

When the destination and eavesdropper chaimels experience independent fading, the strict requirement 
of having a more capable destination chaimel for positive secrecy capacity can be loosened. This is due to 
the simple observation that the destination channel may be more capable than the eavesdropper's chaimel 
under some fading reaUzations, even if the destination is not more capable than the eavesdropper on 
average. Hence, if the channel state information (CSI) of both the destination and eavesdropper channels 
is available at the source, it is shown in [4], [5] that a positive secrecy capacity can be achieved by means 
of appropriate power control at the source. The key idea is to opportunistically transmit only during those 
fading realizations for which the destination channel is more capable [6]. For block-ergodic fading, it 
is also shown in [5] (see also [7]) that a positive secrecy capacity can be achieved with a variable-rate 
transmission scheme without any eavesdropper CSI available at the source. 

When the source, destination, and eavesdropper have multiple anteimas, the resulting chaimel is known 
as a MIMO wiretap chaimel (see [8], [9], [10], [11], [12]), which may also have positive secrecy capacity. 
Since the MIMO wiretap channel is not degraded, the characterization of its secrecy capacity is not 
straightforward. For instance, the secrecy capacity of the MIMO wiretap channel is characterized in [9] 
as the saddle point of a minimax problem, while an alternative characterization based on a recent result for 
multi-antenna broadcast channels is provided in [11]. Interestingly all characterizations point to the fact 
that the capacity achieving scheme is one that transmits only in the directions in which the destination 
channel is more capable than the eavesdropper's channel. Obviously, this is only possible when the 
destination and eavesdropper CSI is available at the source. It is shown in [9] that if the individual 
channels from antennas to anteimas suffer from independent Rayleigh fading, and the respective ratios of 
the numbers of source and destination anteimas to that of eavesdropper anteimas are larger than certain 
fixed values, then the secrecy capacity is positive with probabiUty one when the numbers of source, 
destination, and eavesdropper antennas become very large. 

As discussed above, the availability of destination (and eavesdropper) CSI at the source is an 
implicit requirement for positive secrecy capacity in the fading and MIMO wiretap channels. Thus, 
an authenticated feedback channel is needed to send the CSI from the destination back to the source. 
In [5], [7], this feedback chaimel is assumed to be public, and hence the destination CSI is also available 
to the eavesdropper. In addition, it is assumed that the eavesdropper knows its own CSI. With the 
availability of a feedback chaimel, if the objective of having the source send secret information to the 
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destination is relaxed to distilling a secret key shared between the source and destination, it is shown 
in [13] that a positive key rate is achievable when the destination and eavesdropper channels are two 
conditionally independent (given the source input symbols) memoryless binary channels, even if the 
destination channel is not more capable than the eavesdropper's channel. This notion of secret sharing 
is formalized in [14] based on the concept of common randomness between the source and destination. 
Assuming the availability of an interactive, authenticated public channel with unlimited capacity between 
the source and destination, [14] suggests two different system models, called the "source model with 
wiretapper" (SW) and the "channel model with wiretapper" (CW). The CW model is the similar to 
the (discrete memoryless) wiretap channel model that we have discussed before. The SW model differs 
in that the random symbols observed at the source, destination, and eavesdropper are realizations of a 
discrete memoryless source with multiple components. Both SW and CW models have been extended 
to the case of secret sharing among multiple terminals, with the possibility of some terminals acting 
as helpers [15], [16], [17]. Key capacities have been obtained for the two special cases in which the 
eavesdropper's channel is a degraded version of the destination channel and in which the destination and 
eavesdropper channels are conditionally independent [14], [13]. Similar results have been derived for 
multi-terminal secret sharing [16], [17], with the two special cases above subsumed by the more general 
condition that the terminal symbols form a Markov chain on a tree. Authentication of the public channel 
can be achieved by the use of an initial short key and then a small portion of the subsequent shared 
secret message [18]. A detailed study of secret sharing over an unauthenticated public channel is given 
in [19], [20], [21]. 

Other approaches to employ feedback have also been recently considered [22], [24], [23]. In particular, 
it is shown in [22] that positive secrecy capacity can be achieved for the modulo-additive discrete 
memoryless wiretap channel and the modulo-A channel if the destination is allowed to send signals 
back to the source over the same wiretap channel and both terminals can operate in full-duplex manner. 
In fact, for the former channel, the secrecy capacity is the same as the capacity of such a channel in the 
absence of the eavesdropper. 

In this paper, we consider secret sharing over a fast-fading MIMO wiretap channel. Thus, we 
are interested in the CW model of [14] with memoryless conditionally independent destination and 
eavesdropper channels and continuous channel alphabets. We provide an extension of the key capacity 



result in [14] for this case to include continuous channel alphabets (Theorem 2.1i. Using this result, we 



obtain the key capacity of the fast-fading MIMO wiretap channel (Section III). Our result indicates that 
the key capacity is always positive, no matter how large the channel gain of the eavesdropper's channel 
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is; in addition this holds even if the destination and eavesdropper CSI is available at the destination and 
eavesdropper, respectively. Of course, the availability of the public channel implies that the destination 
CSI could be fed back to the source. However, due to the restrictions imposed on the secret-sharing 
strategies (see Section only causal feedback is allowed, and thus any destination CSI available at 
source is "outdated". This does not turn out to be a problem since, unlike the approaches mentioned above, 
the source does not use the CSI to avoid sending secret information when the destination is not more 
capable than the eavesdropper's channel. As a matter of fact, the fading process of the destination channel 
provides a significant part of the common randomness from which the source and the destination distill a 



secret key. This fact is readily obtained from the alternative achievability proof given in Section IV We 
note that [25], [26] consider the problem key generation from common randomness over wiretap channels 
and exploit a Wyner-Ziv coding scheme to limit the amount of information conveyed from the source to 
the destination via the wiretap channel. Unlike these previous works, we only employ Wyner-Ziv coding 
to quantize the destination channel outputs. Our code construction still relies on a public channel with 
unlimited capacity to achieve the key capacity. 

Finally, we also investigate the limiting value of the key capacity under three asymptotic scenarios. In 



the first scenario, the transmission power of the source becomes asymptotically high (Corollary 3.1 1. In 



the second scenario, the destination and eavesdropper have a large number of antennas (Corollary 3.2 1. 
In the third scenario, the gain advantage of the eavesdropper's channel becomes asymptotically large 



(Corollary 3.3 1. These three scenarios reveal two different effects of spatial dimensionality upon key 
capacity. In the first scenario, we show that the key capacity levels off as the power increases if the 
eavesdropper has no fewer antennas than the source. On the other hand, when the source has more 
antennas, the key capacity can increase without bound with the source power. In the second scenario, we 
show that the spatial dimensionality advantage that the eavesdropper has over the destination has exactly 
the same effect as the channel gain advantage of the eavesdropper. In the third scenario, we show that 
the limiting key capacity is positive only if the eavesdropper has fewer antennas than the source. The 
results in these scenarios confirm that spatial dimensionality can be used to combat the eavesdropper's 
gain advantage, which was already observed for the MIMO wiretap channel. Perhaps more surprisingly, 
this is achieved with neither the source nor destination needing any eavesdropper CSI. 



II. Secret Sharing and Key Capacity 

We consider the CW model of [14], and we recall its characteristics for completeness. We consider 
three terminals, namely a source, a destination, and an eavesdropper. The source sends symbols from 
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an alphabet X. The destination and eavesdropper observe symbols belonging to alphabets 3^ and Z, 
respectively. Unlike in [14], X, y, and Z need not be discrete. In fact, in Section |lll] we will assume 
they are multi-dimensional vector spaces over the complex field. The channel from the source to the 
destination and eavesdropper is assumed memoryless. A generic symbol sent by the source is denoted by 
X and the corresponding symbols observed by the destination and eavesdropper are denoted by Y and 
Z, respectively. For notational convenience (and without loss of generality), we assume that {X, Y, Z) 
are jointly continuous, and the channel is specified by the conditional probability density function 
(pdf) PY,z\x{yT A^)- addition, we restrict ourselves to cases in which Y and Z are conditionally 
independent given X, i.e., PY,z\x{y^ A^) — PY\x{y\x)pz\x{A^)^ which is a reasonable model for 
symbols broadcasted in a wireless medium. Hereafter, we drop the subscripts in pdfs whenever the 
concerned symbols are well specified by the arguments of the pdfs. We assume that an interactive, 
authenticated public channel with unlimited capacity is also available for communicatin between the 
source and destination. Here, interactive means that the channel is two-way and can be used multiple 
times, unlimited capacity means that it is noiseless and has infinite capacity, and public and authenticated 
mean that the eavesdropper can perfectly observe all communications over this channel but cannot tamper 
with the messages transmitted. 

We consider the class of permissible secret-sharing strategies suggested in [14]. Consider k time instants 
labeled by 1,2, ... ,k, respectively. The {X, Y, Z) channel is used n times during these k time instants 
at ii < ^2 < • • • < in- Set in+i = k. The public channel is used for the other {k — n) time instants. 
Before the secret- sharing process starts, the source and destination generate, respectively, independent 
random variable Mx and My. To simplify the notation, let a* represent a sequence of messages/symbols 
ai, 02, . . . , Oj. Then a permissible strategy proceeds as follows: 

• At time instant < i < ii, the source sends message $j = $j(Mx,^'*^^) to the destination, and 
the destination sends message = ^'i(My,^>*-i) to the source. Both transmissions are carried 
over the public channel. 

• At time instant i = ij for j = 1,2, ... ,n, the source sends the symbol Xj = Xj(Mx, ^'*^^^) to 
the {X, Y, Z) channel. The destination and eavesdropper observe the corresponding symbols Yj and 
Zj. There is no message exchange via the public channel, i.e., and ^'j are both null. 

• At time instant ij < i < ij+i for j = 1,2, ... ,n, the source sends message = ^i{Mx, ^'*~^) 
to the destination, and the destination sends message ^'j = \I'j(My, yJ', <I>*^^) to the source. Both 
transmissions are carried over the public channel. 
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At the end of the k time instants, the source generates its secret key K = K{Mx, ^''^), and the destination 
generates its secret key L = L(My, y", <I>^), where K and L takes values from the same finite set /C. 

According to [14], R is an achievable key rate through the channel [X, Z) if for every e > 0, there 
exists a permissible secret-sharing strategy of the form described above such that 

1) Pr{iv: / L} < e, 

2) ll[K-Z'',^^,^^) < e, 

3) ^H{K) > R-e, and 

4) ilog|/C|<iF(K)+e, 

for sufficiently large n. The key capacity of the channel {X, Y, Z) is the largest achievable key rate 
through the channel. We are interested in finding the key capacity. For the case of continuous channel 
alphabets considered here, we also add the following power constraint to the symbol sequence X" sent 
out by the source: 

1 " 

-V|x,f<p (1) 

with probability one (w.p. 1) for sufficiently large n. 

Theorem 2.1: The key capacity of a CW model (X, y, Z) with conditional pdf = 
p(y|x)p(z|x) is given by maxx:£;[|xp]<p[^(^; ^) - ^)]- 

Proof: The case with discrete channel alphabets is established in [14, Corollary 2 of Theorem 2], 
whose achievability proof (also the ones in [16], [17]) does not readily extend to continuous channel 
alphabets. Nevertheless the same single backward message strategy suggested in [14] is still applicable 
for continuous alphabets. That strategy uses = n + 1 time instants with ij = j for j = 1, 2, . . . , n. That 
is the source first sends n symbols through the (X, y, Z) channel; after receiving these n symbols, the 
destination feeds back a single message at the last time instant to the source over the public channel. A 
carefully structured Wyner-Ziv code can be employed to support this secret-sharing strategy. The detailed 



arguments are provided in the alternative achievability proof in Section IV 



Here we outline an achievability argument based on the consideration of a conceptual wiretap channel 
from the destination back to the source and eavesdropper suggested in [13, Theorem 3]. First, assume 
the source sends a sequence of i.i.d. symbols X", each distributed according to over the wiretap 

channel. Suppose that E'[|Xp] < P. Because of the law of large numbers, we can assume that X"^ 
satisfies the power constraint ([T]) without loss of generality. Let y" and be the observations of the 
the destinations and eavesdropper, respectively. To transmit a sequence [/" of symbols independent of 
(X",y", Z"), the destination sends [/" + y" back to the source via the public channel. This creates a 
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conceptual memoryless wiretap channel from the destination with input symbol U to the source in the 
presence of the eavesdropper, where the source observes {U + Y, X) while the eavesdropper observes 



Employing the continuous alphabet extension of the well known result in [3], the secrecy capacity of 
the conceptual wiretap channel (and hence the key capacity of the original channel) is lower bounded by 



Note that the input symbol U has no power constraint since the public channel has infinite capacity. But 

I{U;U + Y,X) - I{U;U + Y,Z) 
= I{U; X) + I{U; U + Y\X) - [I{U; Z) + I{U; U + Y\Z)] 

= h{U) - h{U\X) + h{U + Y\X) - h{U + Y\U, X) - h{U) + h{U\Z) - h{U + Y\Z) + h{U + Y\U, Z) 
= h{Y\Z) - h{Y\X) + [h{U + Y\X) - h{U\X)] - [h{U + Y\Z) - h{U\Z)] 

> h{Y\Z) - h{Y\X) - [h{U + Y\X) - h{U\X)] 

> h{Y\Z)-h{Y\X)-[h{U + Y)-h{U)] (2) 

where the equaUty on the fourth Une results from h{U + Y\U,X) = h{Y\U,X) = h{Y\X) due to the 
independence of U and Y, the inequality on the fifth line follows from the fact 

h{U + Y\Z)- h{U\Z) > h{U + Y\Z, Y) - h{U\Z) = h{U\Z, Y) - h{U\Z) = 0, 

which is again due to independence between {Y, Z) and U, and the inequality on the last line follows 
from h{U + Y\X) - h{U\X) = h{U + Y\X) - h{U) < h{U + Y) - h{U). 

Without loss of generality and for notational simplicity, assume that Y and U are both one-dimensional 
real random variables. Now, choose U to be Gaussian distributed with mean and variance afj. Then 



where the first inequality follows from [27, Theorem 8.6.5] and the last equality is due to the independence 
between Y and U . Combining and (p|), for every e > 0, we can choose afj large enough such that 



{U + Y,Z). 



max[/(C/; U + Y,X)- I{U; U + Y, Z)] . 




(3) 



I{U] U + Y,X)- I{U] U + Y,Z)> h{Y\Z) - h{Y\X) -e = I{X; Y) - I{Y- Z) - e. 



Since e is arbitrary, the key capacity is lower bounded by max£;[|jf |2]<p[/(X; y) — Z)]. 
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The converse proof in [14] is directly applicable to continuous channel alphabets, provided the average 
power constraint ([T]) can be incorporated into the arguments in [14, pp. 1 129-1 130]. This latter requirement 
is simplified by the additive and symmetric nature of the average power constraint [28, Section 3.6]. To 
avoid too much repetition, we outline below only the steps of the proof that are not directly available in 
[14, pp. 1129-1130]. 

For every permissible strategy with achievable key rate R, we have 

-liK-L) = -H(K)--H(K\L) 
n n n 

> -H{K)--[l+Pr{K^L}-log\IC\] 
n n 



> -H{K)---e 
n n 



-H{K)+e 
n 



> {l-e){R-e)---£'^ (4) 
n 

where the second line follows from Fano's inequality, the third line results from conditions 1) and 4) in 
the definition of achievable key rate, and the last line is due to condition 3). Thus it suffices to upper 
bound I{K] L). From condition 2) in the definition of achievable key rate and the chain rule, we have 

n n 

< -/(Mx;My,y"|Z",$^,^'^) +e (5) 
n 

where the second inequality is due to the fact that K = K{Mx,^^) and L = L{My,Y'^,^^). By 
repeated uses of the chain rule, the construction of permissible strategies, and the memoryless nature of 
the {X,Y,Z) channel, it is shown in [14, pp. 1129-1130] that 

n 

-/(Mx;My,y"|Z",ci>fc,M/'=) <-Y^I{Xj;Y,\Z,). (6) 

Now let Q be a uniform random variable that takes value from {1,2, ... ,n}, and is independent 
of all other random quantities. Define {X,Y,Z) = {Xj,Yj, Zj) if Q = j. Then it is obvious that 

Py z\x(y^ ^1^) ~ PY,z\x{y^ ^\^)' ^rid Q can be rewritten as 

-I(Mx;My,y"|Z",^>^^''') < I(X;Y\Z,Q) < I(X;Y\Z) (7) 
n 

where the second inequality is due to the fact that Q ^ X ^ (Y, Z) forms a Markov chain. On the 
other hand, the power constraint ([!]) implies that 

n 

E[\X\'] = -Y.E[\X,\']<P. (8) 
^ i=i 
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Combining (Q, Q, and (|7]), we obtain 



1 



I{X;Y\Z) + 2e + 



1 



R < 



1 - e 



(9) 



n 



Since e can be arbitrarily small when n is sufficiently large, Q, together with ([8]l, gives 



I{X;Y\Z) 




III. Key Capacity of Fast Fading MIMO Wiretap Channel 



Consider that the source, destination, and eavesdropper have ms, mo, and mw antennas, respectively. 
The antennas in each node are separated by at least a few wavelengths, and hence the fading processes 
of the channels across the transmit and receive antennas are independent. Using the complex baseband 
representation of the bandpass channel model: 



where 

• X is the ms x 1 complex-valued transmit symbol vector by the source, 

• Yd is the mo x 1 complex-valued receive symbol vector at the destination, 

• Yw is the m\Y x 1 complex-valued receive symbol vector at the eavesdropper, 

• No is the mo x 1 noise vector with independent identically distributed (i.i.d.) zero-mean, circular- 
symmetric complex Gaussian-distributed elements of variance ajj (i.e., the real and imaginary parts 
of each elements are independent zero-mean Gaussian random variables with the same variance), 

• Nw is the mw x 1 noise vector with i.i.d. zero-mean, circular-symmetric complex Gaussian- 
distributed elements of variance a^,, 

• Hd is the m£) X ms channel matrix from the source to destination with i.i.d. zero-mean, circular- 
symmetric complex Gaussian-distributed elements of unit variance, 

• H\Y is the mw x ms channel matrix from the source to eavesdropper with i.i.d. zero-mean, circular- 
symmetric complex Gaussian-distributed elements of unit variance 

• a > models the gain advantage of the eavesdropper over the destination. 



Yd 



HdX + Nd 



Yw 



aHwX + Nw 



(10) 
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Note that Hd, H\y, Nd, and N\y are independent. The wireless channel modeled by ( [T0| ) is used n times 
as the {X, Y, Z) channel described in Section [n] with Y = [Yd Hd] and Z = [Yw Hw]. We assume that 



the n uses of the wireless channel in ( lOl are i.i.d. so that the memoryless requirement of the {X, Y, Z) 
channel is satisfied. Since Hd and H\y are included in the respective channel symbols observable by 
the destination and eavesdropper (i.e., Y and Z respectively), this model also implicitly assumes that the 
destination and eavesdropper have perfect CSI of their respective channels from the source. In practice, 
we can separate adjacent uses of the wireless channel by more than the coherence time of the channel 
to approximately ensure the i.i.d. channel use assumption. Training (known) symbols can be sent right 
before or after (within the channel coherence period) by the source so that the destination can acquire 
the required CSI. The eavesdropper may also use these training symbols to acquire the CSI of its own 
channel. If the CSI required at the destination is obtained in the way just described, then a unit of channel 
use includes the symbol X together with the associated training symbols. However, as in [29], we do not 
count the power required to send the training symbols (cf. Eq. ([T])). Moreover we note that the source 
(and also the eavesdropper) may get some information about the outdated CSI of the destination channel, 
because information about the destination channel CSI, up to the previous use, may be fed back to the 
source from the destination via the public channel. More specifically, at time instant ij, the source symbol 
Xj is a function of the feedback message vl/*^^^, which is in turn some function of the realizations of 
Ho at time ii,i2, ■ ■ ■ We also note that neither the source nor destination has any eavesdropper 

CSI. Referring back to ( [TO] ), these two facts imply that X is independent of Hd, Hw, Nd, and Nw, 
i.e., the current source symbol X is independent of the current channel state. 



Since the fading MIMO wiretap channel model in ( |T0| ) is a special case of the CW model considered 

as: 



in Section [TTj the key capacity Ck is given by Theorem 2. 1 



Ck= max [I{X;Yn,Hn)-I{YD,Hn;Yw,Hw)]. (H) 

X:E[\X\'^]<P 



Note that 



I{X-Yd,Hd)-I{Yd,Hd-Yw,Hw) = I{X;Yd\Hd) - I{Yd;Yw\Hd, Hw) 

= h{YD\Yw, Hd, Hw) - h{YD\X, Hd) 
= h{YD\Yw,HD,Hw) - mDlogineaj)). (12) 



Substituting this back into (111, we get 



Ck = max h(YD\Yw,HD,Hw) - mDlog{TreaD). (13) 

X:E[\X\^]<P \ lyy 
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As a result, the key capacity of the fast-fading wiretap channel described by ( [TO] ) can be obtained by 
maximizing the conditional entropy h{YD\Yw, Hd, Hyy). This maximization problem is solved below: 
Theorem 3.1: 

where f denotes conjugate transpose. 

Proof: To determine the key capacity, we need the following upper bound on the conditional entropy 

h{U\V) 

Lemma 3.1: Let U and V be two jointly distributed complex random vectors of dimensions mu and 
my, respectively. Let Ku, Ky, and Kjjy be the covariance of U, covariance of V, and cross-covariance 
of U and V, respectively. If Ky is invertible, then 

h{U\V) < logdet{Ku - KuvKy^Kyu) + mulogijre). 

The upper bound is achieved when [f7^ V'^Y' is a circular-symmetric complex Gaussian random vector. 

Proof: We can assume that both U and V have zero means without loss of generality. Also assume 
that the existence of all unconditional and conditional covariances stated below. For each v, 

h(U\V = v)< log ((vre)™- Aei{Ku\,)) (14) 

where Ku\^ is the covariance of U with respect to the conditional density P;7|i/(u|f) [29, Lemma 2]. 
This implies 

h{U\V) < Ev [log {{Tier" dei{Ku\v))] 

< logdet(£'y[Kf/|y]) + m[/log(7re) 

< logdet{Ku - KuvKy^Kvu) + mulog{TTe). (15) 

The second inequality above is due to the concavity of the function logdet over the set of positive 
definite symmetric matrices [30, 7.6.7] and the Jensen's inequality. To get the third inequality, observe that 
Sy[-K'^|y] can be interpreted as the covariance of the estimation error of estimating U by the conditional 
mean estimator On the other hand, Ku — KjjvKy^Kvu is the covariance of the estimation 

error of using the linear minimum mean squared error estimator KjjvKy^V instead. The inequality 
results from the fact that Ku - KuvKy^Kyu > Ev[Ku\v] (i-e., [Ku - KuvKy^Kyu] - Ev[Ku\v] 
is positive semidefinite) [31] and the inequality of det(j4) > det(S) if A and B are positive definite, 
and A>B [30, 7.7.4]. 



Ck=E 
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Suppose that [U'^ V'^]'^ is a circular-symmetric complex Gaussian random vector. For each v, the 
conditional covariance of U, conditioned on V = v, is the same as the (unconditional) covariance of 
U - KuvKy^V. Since U - KuvKy^V is a circular-symmetric complex Gaussian random vector [29, 



Lemma 3], so is U conditioned onV = v. Hence by [29, Lemma 2], the upper bound in ( 14i is achieved 
with Ku\^ = Ku — KuyKy^Kvu, which also gives the upper bound in ( 15l. ■ 



To prove the theorem, we first obtain an upper bound on Ck and then show that the upper bound is 



achievable. Using Lemma 3.1 we have 



h{YD\Yw, Hd, Hw) - ruD log(7recr|)) < E [log det {Ky„ - KY^Y^KylKY^Yn)] - '^d log ajj (16) 



where Ky^ and Ky^, are respectively the conditional covariances of Yd and Y\y, given Hd and Hw, 



and KyjjYiv ^^'^ ^YwYo are the corresponding conditional cross-covariances. Substituting (16 1 into (13 1, 
an upper bound on Ck is 



(17) 



Thus we need to solve the maximization problem ( 17 1. To do so, let Ai, A2, • • • , A^s be the (nonnegative) 



eigenvalues of Kx- Since both the distributions of Hd and H\y are invariant to any unitary transformation 
[29, Lemma 5], we can without any ambiguity define 



/(Ai, A2, . . . , Xms) 



E 



log det Inio + ^HdK 



a 



1/2 



D 



^2 N -1 



(18) 



That is, we can assume Kx = diag(Ai, A2, . . . , A^s) with no loss of generality. Then we have the 



following lemma, which suggests that the objective function in (17i is a concave function depending 
only on the eigenvalues of the covariance of X: 

Lemma 3.2: Suppose that X has an arbitrary covariance Kx, whose (nonnegative) eigenvalues are 
Ai, A2, . . . , Ams- Then 



E [log det {Ky^ - Ky^y„ Ky^l KY„Yn ) ] - mo log cj|) = /( Ai , A2 , . . . , A„ J 



(19) 



is concave in A = {A, > for i = 1, 2, . . . , ms}. 



13 



1/2 

Proof: First write Ad = H^K-^ and A 



aHwK 



1/2 
X 



AdA^d + (yolmo^ Ky„ = AwA\^ + a'^Im„, and Ky^Y^ = Ad A 



. It is easy to see from (10 1 that Ky^ 
Then 



t 

w 



Ky^ - Ky^y^Kyl^Ky^.y^ 



D 



I 



Ims ~ y^wA^y + (J^ilmw 

1 "1 



1 



+ —A^^Aw 



a 



w 



A 



A^ 



(20) 



where the last equality is due to the matrix inversion formula. Substituting this result into the left hand 



side of (19 1, we obtain the right hand side of (18 1, and hence (19 1 



To show concavity of /, it suffices to consider only diagonal Kx = diag(Ai, A2, • • • , Ams) in 

KYwYo Ky^v 



A. Note that the mapping H : Kx 



is linear in A. Also the mapping 



F : 



Ky^ - Ky^y^.Kyl^Ky^y^ is matrix-concave in H{K) [32, Ex. 3.58]. 

_ Ky^yj, Ky^. 

Thus the composition theorem [32] gives that the mapping G : Kx Ky^ — Ky^y^Kyl^Ky^^.y^ is 
matrix-concave in A, since G = FoH. Another use of the composite theorem together with the concavity 



of the function log det as mentioned in the proof of Lemma 3. 1 shows that log det G is concave in A 



Thus ( 19 1 implies that / is also concave in A. 



Hence it suffices to consider only those X with zero mean in (17). 



Now define the constraint set Ap = {Aj > for i = 1, 2, . . . , ms and — Lemma 3.2 

impUes that we can find the upper bound on Gk by calculating maxAj, /(Ai, A2, . . • , A^s), whose value 
is given by the next lemma: 

/ P P 



Lemma 3.3: max/(Ai, A2, • • • , A^s) = / ( — , — , • • • , — )■ 

Ap \ms ms msj 



Proof: Since the elements of both Hd and H\y are i.i.d., / is invariant to any permutation of 
its arguments. This means that / is a symmetric function. By Lemma [3^ / is also concave in Ap. 
Thus it is Schur-concave [33]. Hence a Schur- minimal element (an element majorized by any another 



element) in Ap maximizes /. It is easy to check that ( 



is Schur-minimal in Ap. Hence 



maxAp /(Ai,A2,...,A^ 



"^3 "Is ' 



P 

ms 
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Combining the results in (17i, (18 1, Lemmas 3.2 and 3.3 we obtain the upper bound on the key 
capacity as 



Ck < E 



E 



log det Imn + 



P 



Hd Ims + 



o?P 
msCTw 



log 



(21) 



det [i^^ + -^h\^Hw, 

where the identity + UV-^U'^) = '^"^^^^{"^1^^ for invertible V [34, Theorem 18.1.1] has been used. 

On the other hand, consider choosing X to have i.i.d. zero-mean, circular-symmetric complex Gaussian- 
distributed elements of variance Then conditioned on Hd and Hyy, \Yd^w\^ ^ circular- 
symmetric complex Gaussian random vector, by applying [29, Lemmas 3 and 4] to the linear model 

gives 



of (10 1. Hence Lemma 3.1 



h{YD\Yw,HD,Hw) = E [log det {Ky^ - KY^Y,yK-lKY,,,Yo)] + mfllog(^e) 



where Ky^ 



rris 



HdHj^ + ajjlmu^ Ky^ 



nis 



HwHLt + cr^/m„,, and KyoY^ 



—HdH^. 



Substituting this back into ( 12 1 and using the matrix inversion formula to simplify the resulting expression. 



we obtain the same expression on the first line of (21 1 for I{X] Yd, Hd) — I(Yd,Hd', Yw, Hw)- Thus 
the upper bound in ( [2T] ) is achievable with this choice of X; hence it is in fact the key capacity. ■ 
In Fig. [T| the key capacities of several fast-fading MIMO channels with different number of source, 
destination, and eavesdropper antennas are plotted against the source signal-to-noise ratio (SNR) P/a^ 



'w 



. The channel gain advantage of the eavesdropper is set to = 1. We observe 



where a\y 

that the key capacity levels off as Pja^ increases in three of the four channels, except the case of 
{rns^raDi mw) = (2, 1, 1), considered in Fig. [T] It appears that the relative antenna dimensions determine 
the asymptotic behavior of the key capacity when the SNR is large. To more precisely study this behavior, 
we evaluate the limiting value of Ck as the input power P of the source becomes very large. To highlight 
the dependence of Ck on P, we use the notation Ck{P)- 
Corollary 3.1: 1) If mw > w-s. then 



lim CkIP) = E 

P^oo 



2) Suppose that mw < ms- Define 



log 



det I hIHw + t^H^oHd 



Coo{P) = E 
Then limp 



Ck{P) 
°° C^(P) 



log det Imo + 



1. 



Hd 



det ( h\^Hw 



H 



w 



H 



D 
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mj^=1 , iTi^=1 
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Fig. 1. Key capacities of fast-fading MIMO wiretap channels with different numbers of source, destination, eavesdropper 
antennas. The eavesdropper's channel gain = OdB, and (t|, = a'^ = . 



Proof: First fix (Ai, A2, • • • , \ms) 



p p 

ms ' ms ' 



P 

ms 



or equivalently Kx 



-Ims' ^nd consider 



the mapping G defined in the proof of Lemma 3.2 as a function of P. Also define 

'1 



/(P)=logdet I^, + 



P 



Hd I Ims + 



a'P 



H 



Thus Ck{P) = E[f{P)]. It is not hard to check that for any P <P, G{P) > G{P), which impUes that 
det(G(P)) > det(G(P)). Hence / is increasing in P. Since the elements of Hw are continuously i.i.d., 
Tank{HlyHw) = Tank{HwHly) = Tank{Hw) = mm{ms,mw) w.p.l. Thus the matrix Hy^Hw (resp. 
HwH^^) is invertible w.p.l when mw > ms (resp. mw < ms). 
Now, consider the case of mw > ms- As in ( |2T] ), we have 



f{P) = log 



Since Hy^Hw is invertible w.p.l, 



lim f'{P) = log 

P— >oo 



det ( ^^Ims + H^Hw + ^H^Hd 



det 



2 p -'mg 



det (//^^i^w- + ^hIHd 
det (hI^Hw 



w.p.l. 
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Hence Part 1) of the lemma results from monotone convergence. 

For the case of mw < m-s, the matrix inversion formula allows us to instead write 



/(P)=logdet I™, + 



Since HwrHy^ is invertible w.p.l, we can also define 



P 



Hd 



w 



H 



D 



/oo(P) =logdet Im^ + 



p 



Hd 



ms 



-1 



w 



H 



D 



Note that Coo{P) = E[foo{P)]- Since Hy/ is of rank my/ w.p.l, it has the singular value decomposition 
Hw = Uw [Sw 0ms -m„-] where Sw = diag(si, S2, . . . , Sm„-) is a diagonal matrix whose diagonal 
elements are the positive singular values of Hyr- Also let y = [t^ T/], i.e., Vw and Vw consist respectively 
of the first mw and the last ms — mw columns of V. Employing the unitary property of Uw and Vw, 
it is not hard to verify that 

P 



f{P) 



logdet { Imn + 



HdVwV^H^d + HdVwAw{P)V^HJ, 



t zjt 



'mo I 2 

mscrfj 



logdet /run + 



P 



msaj^ 



(22) 
(23) 



where kw{P) = ^ ("^^Im^ + S^) . From (22 1 and (23 1, it is clear that /oo(P) < f{P). 



Further let t{P) = tr [HdVwAw{P)vIhU . Since t(P)/m„ > HdVwAw{P)V^HI,, 



f{P) < logdet ([l + t{P)]Im^ + ^^HDVwV^Hl 
= mDlog(l + t(P))+logdet(/m„+ ^ 



-HdVwV^hI 



(24) 



msal[l + t{P)]' 

Let fii, fi2, . . . , /ij be the positive eigenvalues of HdVwVw-^d- Note that 1 < j < min(m£), m^ — mvi/), 
because of the fact that the elements of Hd are continuously i.i.d. and are independent of the elements 



of Hw- Hence, from (|23j), (|24|) and the fact that foo{P) < f{P), we have 

< f{P)-foo{P) < mDlog{l+t{P))+log 

j 

= mDlog{l+t{P))+Y,^og 



nil 


\i 1 ■f'^- 1 


nil 







1 



+ 



i=l 



l+t(P) ^ P^ti 
Pm. 



(25) 



Now note that 



lim t{P) 



a 



w 



2 2'^"^ [^dVwS^^V^^hIj 



D 



,^tT([H^'HU^H^'Hl, 
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where H^^ denotes the Pemose-Moore pseudo-inverse of Hy/- Then (25 1 impUes that 



< liminf[/(P)-/oo(P)] 
< limsup[/(P)-/oo(P)] 

P^oo 



Hence by Fatou's lemma, we get 



< \\ni-inl[CK{P)-C^{P)\ 

P^oo 

< \imsnv[CK{P)-C^{P)] 

P^oo 



< E 



{run - j) log f 1 + 4^tr ([H^'Hl,]^H^'Hj,)) 



(26) 



From (23 1, it is clear that foo{P) increases without bound in P w.p.l; hence Coo{P) also increases 



without bound. Combining this fact with (26 1, we arrive at the conclusion of Part 2) of the lemma. ■ 
Part 1) of the lemma verifies the observations shown in Fig. [T] that the key capacity levels off as the 
SNR increases if the number of source antennas is no larger than that of eavesdropper antennas. When 
the source has more antennas, Part 2) of the lemma suggests that the key capacity can grow without 
bound as P increases similarly to a MIMO fading channel with capacity Coo{P)- Note that the matrix 



H\Y in the expression that defines Coo{P) is a projection matrix to the orthogonal 



complement of the column space of Hy/- Thus Coo{P) has the physical interpretation that the secret 
information is passed across the dimensions not observable by the eavesdropper. The most interesting 
aspect is that this mode of operation can be achieved even if neither the source nor the destination knows 
the channel matrix Hyy- 

We note that the asymptotic behavior of the key capacity in the high SNR regime summarized in 



Corollary 3.1 is similar to the idea of secrecy degree of freedom introduced in [35]. The subtle difference 
here is that no up-to-date CSI of the destination channel is needed at the source. 

Another interesting observation from Fig. [T]is that for the case of {ms,m£),mw) = (1, 10, 10), the 
source power P seems to have little effect on the key capacity. A small amount of source power is enough 
to get close to the leveling key capacity of about 1 bit per channel use. This observation is generalized 
below by Corollary |3.2[ which characterizes the effect of spatial dimensionality of the destination and 
eavesdropper on the key capacity when the destination and eavesdropper both have a large number of 
antennas. 
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.m3=1,m^.1,m^^=1 
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Fig. 2. Key capacities of fast-fading MIMO wiretap channels witli different numbers of source, destination, eavesdropper 
antennas. The source signal to noise ratio P/a^ = lOdB, where cr|) — — . 



Corollary 3.2: When rriD and m\Y approaches infinity in such a way that lim 



mw 



mo,mw^oo m£) 



Ck ^ ms log 1 + 



1 



Proof: This corollary is a direct consequence of the fact that ^h'^^Ho Ims and ^^H^^H] 



Irris w.p.l, which is in turn due to the strong law of large numbers. 



Note that we can interpret the ratio /5 as the spatial dimensionality advantage of the eavesdropper over 
the destination. The expression for the limiting Ck in the corollary clearly indicates that this spatial 
dimensionality advantage affects the key capacity in the same way as the channel gain advantage a^. 

In Fig. |2j the key capacities of several fast-fading MEVIO channels with different numbers of source, 
destination, and eavesdropper antennas are plotted against the eavesdropper's channel gain advantage a^, 
with P/cr^ = lOdB. The results in Fig. |2] show the other effect of spatial dimensionality. We observe that 
the key capacity decreases almost reciprocally with in the channels with (ms,m£),mvF) = (1, 1, 1) 
and {ms,mD, mw) = (2, 2, 2), but stays almost constant for the channel with {ms,mD-,mw) = (2, 1, 1). 
It seems that the relative numbers of source and eavesdropper antennas again play the main role in 
differentiating these two different behaviors of the key capacity. To verify that, we evaluate the limiting 
value of Ck as the gain advantage of the eavesdropper becomes very large. To highlight the dependence 
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of Ck on a^, we use the notation Ck{cP')- 

Jo if niw > rns 

Corollary 3.3: liuia^QQ 

[ CooiP) if mw < ms- 
Proof: Similar to the proof of Corollary |3.1[ ■ 

Similar to the case of large SNR, when the number of source antennas is larger than that of the 

eavesdropper's antennas, secret information can be passed across the dimensions not observable by the 

eavesdropper. This can be achieved with neither the source nor the destination knowing the channel 

matrix Hw- 

IV. Alternative Achievability of Key Capacity 

In this section, we provide an alternative proof of achievability for key capacity, which does not 
require the transmission of continuous symbols over the public channel. We derive the result from "first 
principles", which provides more insight on the desirable structure of a practical key agreement scheme. 
The main steps of the key agreement procedure are the following: 

1) the source sends a sequence of i.i.d. symbols X"; 

2) the destination "quantizes" its received sequence Y"- into with a Wyner-Ziv compression 
scheme; 

3) the destination uses a binning scheme with the quantized symbol sequences to determine the secret 
key and the information to feed back to the source over the public channel; 

4) the source exploits the information sent by the destination to reconstruct the destination's quantized 
sequence Y'"- and uses the same binning scheme to generate its secret key. 

The secrecy of the resulting key is established by carefully structuring the binning scheme. 

For the memoryless wiretap channel {X,Y,Z) specified by the joint pdf p{y\x)p{z\x)p{x), consider 
the quadruple {X,Y,Y, Z) defined by the joint pdf p{x,y,y,z) = p{y\y)p{y\x)p{z\x)p{x) with p{y\y) 
to be specified later. We assume that Y takes values in the alphabet y. Given a sequence of n elements 
Xn = {xi,X2, ■ ■ ■ ,Xn), ^(x") = YYj=iP{xj) unlcss Otherwise specified. Similar notation and convention 
apply to all other sequences as well as their corresponding pdfs and conditional pdfs considered hereafter. 

A. Random Code Generation 

Choose p{y\y) such that I{X; Y)—I{Y; Z) > and I{Y; Z) > 0, and let p{y) denote the corresponding 
marginal. Note that the existence of such p{y\y) can be assumed without loss of generality if I{X; Y) — 
I{Y; Z) > and I{Y; Z) > 0. If I{X; Y) - I{Y; Z) = 0, there is nothing to prove. Similarly, if 
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I{Y; Z) = 0, the construction below can be trivially modified to show that I{X\ Y) is an achievable key 
rate. 

Fix a small (small enough so that the various rate definitions and bounds on probabilities below make 
sense and are non-trivial) e > 0. Let us define 



Ri 


A 


!{¥■¥)+ Ae 


R2 


A 


I{Y-Y)-I{X-Y)+22e 




A 


I{X-Y)-I{Y-Z)-e 


Ri 


A 


I{Y-Z) - lie. 



For each j = 1,2,..., 2"^^ and / = 1, 2, . . . , 2"^\ generate 2"^* codewords 
y"(j,Z,l),y"(j,Z,2),...,y"(j,/,2"-f^^) according to The set of codewords {y"(j,/,A;)} with 

k = 1 . . . 2"^* forms a subcode denoted by C(j, /). The union of all subcodes C(j, /) for j = 1, 2, . . . , 2*^^^ 
and I = 1,2,..., 2"^^ forms the code C. For convenience, we denote the 2"^^ codewords in C as 
y"(l),y"(2),...,y'^(2"-f^i). where y"(j + (/ - 1)2"-^^=^ + [w - l)2<^^+^-^)) = Y''{j,l,w) for 
j = 1, 2, . . . , 2''■^^ I = 1,2,..., 2"-^\ and w = 1, 2, . . . , 2"^". The code C and its subcodes C(i, /) is 
revealed to the source, destination, and eavesdropper. In the following, we refer to a codeword or its 
index in C interchangeably. Under this convention, the subcode C{j,l) is also the set that contains all 
the indices of its codewords. Denote C{j) = \Jl^^ C{j,l) and C{1) = U^=/ /). 



B. Secret Sharing Procedure 

For convenience, we define the joint typicality indicator function Te(-) that takes in a number of 
sequences as its arguments. The value of re(-) is 1 if the sequences are e-jointly typical, and the value 
is otherwise. Further define the indicator function for the sequence pair 

^ 1 if Pr{T,(X-, y", y", Z") = 1} > 1 - e 

I otherwise 

where is distributed according to p(a;", z"|y", y") in the definition above. 

The source generates a random sequence X" distributed according to If X" satisfies the average 

power constraint ([T]), the source sends X" through the {X, Y, Z) channel. Otherwise, it ends the secret- 
sharing process. Since p{x) satisfies i?[|Xp] < P, the law of large numbers implies that the probability 
of the latter event can be made arbitrarily small by increasing n. Hence we can assume below, with no 
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loss of generality, that satisfies ([T]) and is sent by the source. This assumption helps to make the 
probability calculations in Section [TV-C| less tedious. 

Upon reception of the sequence Y"-, the destination tries to quantize the received sequence. Let M 
be the output of its quantizer. Specifically, if there is a unique sequence Y^{m) G C for some m E 
{1,2,. . . ,2"-^!} such that ^^(y", y"(m)) = 1, then it sets the output of the quantizer to M = m. If 
there is more than one such sequence, M is set to be the smallest sequence index m. If there is no such 
sequence, it sets M = 0. Let L and J be the unique indices such that y"(M) € C{J,L). The index L 
will be used as the key while the index J is fed back to the source over the public channel, i.e. = J. 
If M = 0, set J = and choose L randomly over {1,2,..., 2"^-'} with uniform probabilities. 

After receiving the feedback information J via the public channel, the source attempts to find a unique 
Y^{m) G C such that T£(X",y"(m)) = 1 and m G C(J). If there is such a unique y"(m), the source 
decodes M = m. If there is no such sequence or more than one such sequence, the source sets M = 0. 
If J = 0, it sets M = 0. Finally, if M > 0, the source generates its key K = k, such that M eC{J,k). 
If M = 0, it sets K = 0. 

We also consider a fictitious receiver who observes the sequence and obtains both indices J and 
L via the public channel. This receiver sets Af = if J = 0. Otherwise, it attempts to find a unique 
y"(m) G C such that Ts{Y"'{in), Z"') = 1 and m G C{J, L). If there is such a unique Y"{m), the source 
decodes M = m. If there is no such sequence or more than one such sequence, the source sets M = 0. 



C. Analysis of Probability of Error 



We use a random coding argument to establish the existence of a code with rates given by ( 27 1 such that 
Pr{K 7^ L} and Pr{Af ^ M} vanish in the limit of large block length n. Without further clarification, 
we note that the probabilities of the events below, except otherwise stated, are over the joint distribution 
of the codebook C, codewords, and all other random quantities involved. 

Before we proceed, we introduce the following lemma regarding the indicator function S^- 
Lemma 4.1: 1) If (y", y") distributes according to y"), then Pr{S'e(y", y") = 1} > 1 - e 
for sufficiently large n. 

2) If y" distributes according to then Pr{S's(y", y") = 1} < ^'"^"1'^'''' for all y". 

3) If y" distributes according to then Pr{S'£(y", y") = 1} < for all y". 

4) If (y",y") distributes according to p(y")p(y"), then Pr{S'e(y", y") = 1} > (1 - e) • 2-"(^i-^) 
for sufficiently large n. 

Proof: 
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1) This claim is actually shown in [36]. We briefly sketch the proof here using our notation for 
completeness and easy reference. By the reverse Markov inequality [36], 

PriS.iV.Y-) = !}>!- ^-Prmx'^y^'^-,Z") = l) 

where the second inequality is due to that fact that Pr{re(X", y", y", Z") = 1} > 1 - for 
sufficiently large n. 

2) First, we only need to consider typical since the bound is trivial when y" is not typical. Notice 
that for any such y^, 

1 > lTs{x^,y^,r,znp{^'',r,z^\yndx^dz^dr 

= I Pr{r,(X",y",y",Z") = 1} • ^j^dr 
f o-n{h(Y,Y)+e) 

> y Pr{T,(x^y^y^z") = 1} . ^_,(,(^)_,) dr 



Hence 



Now 



2nihiY\Y)+2e) > ^ Pr{re(X", y", y", Z") = 

> I 5e(y",r) •Pr{T,(X",y",y",Z") = l}dr 

> (l-e) [ S,{y^,r)dr- (28) 



Pr{5,(y",y") = l} = j S,{y\r)p{r)dr 



2~n(I{Y;Y)-3e) 
< , 

l-e 



where the last inequality is due to (28 1 



3) Same as Part 2), interchanging the roles of and y^' 
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4) From Part 1), we get 



l-£ < 



2-n{h{Y,Y)-e) 



2-n{h{Y)+e) . 2-n(/i(y)+£) 
2n(I(Y;Y)-3s) Pi.{5^(y'^, y'^) = 1}. 



■p{y^)p{r)dy"dr 



Moreover we need to bound the probabilities of the following events pertaining to M. 



Lemma 4.2: 1) Pr{M = 0} <2£ for sufficiently large n. 

2) For m = 1, 2, . . . , 2"^S Pr{M = m} < 2-"(«i^-^^) ^ 



1 - 



2" n(Ri — 7e) 



n m— 1 



l-£ 



• (1 - £)2-"(^i-^) uniformly for 



3) When n is sufficiently large, Pr{M = m} > 
all m = l,2,...,2"^i. 

4) When n is sufficiently large, Pr{J = j,L = Z} > (1 - e)'^ ■ 2-<^^-^^+^^) uniformly for all 
j = 1, 2, . . . , 2"^^ and / = 1, 2, . . . , 2**^^ 

Proof: 

1) We will use an argument similar to the one in the achievability proof of rate distortion function 
in [27, Section 10.5] to bound Pr{M = 0}. First note that {M = 0} is the event that 
SeiV, y"(m)) = for all m G {1, 2, . . . , and hence 



Pr{M = 0} = Pr < Pi {Se{Y'\ y"(m)) = 0} 

{m=l 

= J [Pr{S,iy-, = 0}] piy^)dy\ 



(29) 



where the second equality is due to the fact that 1""(1), . . . , y"(2"^i) are i.i.d. given each fixed y^' 
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But 



Pr{5,(y^y"(l)) = 0} 



1- / Se{y'',r)p{r)dr 



2"«i 



1- / Se{y^,r)p{yV) 



2"«i 



< 



n-n{h{Y)+e) . n~n{h{Y)+e) 

1- / S,{y^,r)p{r\yl ^^77;77>r-^ — dr 



2-n{h{Y,Y)-e) 



^ _ 2-n(/{y;y)+3£) / Se{y'',r)p{yV)dr 



2"-"i 



(30) 



where the inequahty on the third line is due to the fact that ^^(y", y") = 1 impUes Te(y", y") = 1, 
and the last line results from the inequality (1 — xy)^ < 1 — x + e~^^ for all < x,y < 1 



and positive integer k [27, Lemma 10.5.3]. Substituting ( [30] ) back into (29 1 and using Lemma 4.1 
Part 1), we get 

Pr{M = 0} < 1 - Pr{5£(y", y") = 1} + exp (-2"^) < e + e = 2e 

for sufficiently large n. 
2) Notice that for m = 1, 2, . . . , 2"^S 

Pr{M = m} = Pr{5e(y", y"(m)) = 1, 5^(y", y"(m - 1)) = 0, ... , ^^(y", y"(l)) = 0} 

m— 1 



Pr{5,(y",y"(l)) = 1} [Pr{Ss{y^,Y^{l)) = 0}J (31) 
where the second equality results from the i.i.d. nature of y"(l), . . . , y"(m). Thus we have 

2— — Te) 



Pr{M = m}< Pr{S'£(y",y"(l)) = 1} < 



1 -e 



where the last inequality is due to Part 2) of Lemma 4.1 since y" and y"(l) are independent 



3) From (31 1, we have the lower bound 



Pr{M = m} > 



> 



1 



2~n(-Ri-7e) 

1-e 
1-e 



rn—1 



Pr{5e(y",y"(i)) = i} 



m— 1 



• (1 - e)2-"(^i-=) 



where the first inequality is due to Part 2) of Lemma 4. 1 and the second inequality is from Part 4) 
of Lemma |4.1| when n is sufficiently large. Note that the same sufficiently large n is enough to 
guarantee the validity of the lower bound above for all m = 1, 2, ... , 2"^^. 
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4) First note that, for j = 1, 2, . . . , 2"^^ and / = 1, 2, . . . , 2"^^ 



Pr{J = j,L = /}= J2 Pr{M = m} = ^Pr{M = j + (Z-l)2'^-^^ +(u;-l)2'^(-^^+-^^)} 

meC(j,l) 



w=l 



Thus applying Part 3) of the lemma, we get 



PT{J = j,L = l} 

> (1 -e)2-"(^i-^) • ^ 



w=l 



> (1 -e)2-"(^i-^) 

> (1 -e)2~"(^i-^) 



2" — 7e) 

1 -e 

2" n(-R4 — 7e) 



1 -e 

2n(H2+H3) 



j-l + («~l)2"«2+(tt,_l)2' 



1- [l-2-"(^i-7e)/(l-e) 

, 2"(-«2 + «3) 



1 -e 



1- [1 -2-"(«i-7£)/(i 
1_ [l_2-»(^^-7^)/(l-£)]'""^ 

1 - [1 -2-"(«^-7^)/(l -e)] 



> — e)^ • 2~"(-^i~^''+^'^) 



2" «(-R4 — 7e) 

1 - e 



exp(-2^"^) 
1-e 



(32) 



uniformly for all j = 1,2,..., 2"^^ and / = 1, 2, . . . , 2"^^ when n is sufficiently large. The lower 
bound on the fourth line of (32i above is obtained from the inequality {1 — x)^ > 1 — kx for 



any < x < 1 and positive integer k. The lower bound on the fifth line is in turn based on the 
inequality (1 — x)^ < e^'^^ for < x < 1 and positive integer k. 

m 

We first consider the error event {K / L}. Note that 



Pr{K ^ L} = Pr{M = 0} + Pr{M > 0, K / L} 

= Pr{M = 0}+ ^Pr|£^mUfm,M = m| 



m=l 

2"«i 



< Pr{M = 0} + ^ Pr |<S„, M = m} + ^ Pr {<S„, M = m} 



(33) 



m=l 



m=l 



where £m is the event {Ts{X"',Y"-{m)) = 0}, and £m is the event that there is an m' G C(j) such that 
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m G C{j), m' / m, and Te(X", = 1. From (31 1, we have 



Pr {^^,M = m} 

= Pr {r,(x", y"(m)) = 0, 5,(y", f"(m)) = 1, y"(m - 1)) = 0, ... , y'^(i)) = 0} 
< Pr {r,(x", y", y"(m), z") = o,5,(y",f"(m)) = 1, 
5,(y^ y"(m - 1)) = 0, ... , 5,(y", y"(i)) = 0} 

Pr|r£(x",y",y"(m),z") = 0, 5e(y", y"(m)) = l} p(x", 



m— 1 



• n Pr{5,(y",y"(m')) = 0My")ciy" 

m'=l 
m— 1 

• n Pr{5.(y",y"(m')) = 0My")ciy" 

m'=l 

< £ • Pr {5,(y", y"(m)) = 1, 5,(y", y"(m - 1)) = 0, ... , 5,(y", y"(i)) = 0} 

= e-Pr{M = m}, 



(34) 



where the equaUty on the fourth Une is due to the i.i.d. nature of y"(l), . . . , y"(2"^i), the equaUty on 
the fifth line results from the fact that = z"!?/", y") (since {X,Z) Y ^ Y), and 

the inequality on the second last line is from the definition of the indicator function Se- 



Similarly assuming m £ C{j), we have from (31 



Pr{^„, M = m} < J2 Pr{re(X",y"(m')) = l,5,(y",y"(m)) = l} 



m' 6 C(j) 
m ^ m 

E 

m' e C(j) 
m ^ m 



^ 2n(Ri-R2) . 2-"(-f{^;'i')-3e) . 



Pr{r,(x",y"(m')) = 1} •Pr{5,(y",y"(m)) = 



1 -e 



1-e ' 



(35) 



where the equality on the second line is due to the independence between Y^{m') and y"(m), and the last 



inequality results from Part 2) of Lemma 4.1 and the bound Pr{re(x", y"(m')) = 1} < 2-"(^('^''*^)-3e), 



which is a direct result of [27, Theorem 15.2.2]. Hence, substituting the bounds in (B4b and (35 1 back 
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into (33 1 and using Part 1) of Lemma I4l2j we obtain 



2-n(_Ri+8£) 



-8ne 



Ft{K ^ L} <2e + e-Y^ Pr{M = m} + ^ — - — - — = 2e + e + - — - < 4e 



(36) 



■m=l m=l 

for n is sufficiently large. 

Next we consider the event {M ^ M}. Define Tm as the event \T^{Y'^{m)^Z'^) = 0} and Tm as 
the event that there is an ml G C{l,j) such that m G C{l,j), m' ^ m, and T£(y"(m'), Z") = 1. Then 



we have, when n is sufficiently large, uniformly for all j = 1, 2, 



and / = 1,2 



, ^ , . . . , . 



Pr{M / M|J = j,L = 1} 

< Yl P^{^m,M = m\J = j,L = l}+ Y PH^m,M = m\J = j,L = l} 

meC(j,l) m£Cij,l) 

< Y e-Pr{M = m|J = j,L = /}+ ^ — 

m£C(j,l) meC{j,l) 



£ Pr{J = j,L = l} 



< £ + 



2-n(-Ri+7e) 



1-e (l-e)'^ ■ 2-"(^i-^^+6<^) 



2" ne 



(37) 



Note that the inequality on the third line of (37 1 results from upper bounds of Pr{^m,-^ = n^} and 



Pr{J^m,M = m}, which can be obtained in ways almost identical to the derivations in (34i and (35 1 



respectively. The inequality on the fourth line is, on the other hand, due to Part 4) of Lemma 4.2 
By expurgating the random code ensemble, we obtain the following lemma. 

Lemma 4.3: For any e > and n sufficiently large, there exists a code C„ with the rates R2, R3, 
and i?4 given by ( [27] ) such that 

1) Pr{i^ / L\C = Cn} < 8e, 

2) Pr{M / M\C = Cn} < 8e, 



3) Pr{M = m\C = C„} < 



2-n(Ri-7e) 



for all m = 1, 2, . . . , 2"^^, and 



4) Pr{L = l\C = Cn} < 2-"(^^"'^^) for all / = 1, 2, . . . , 2 



riRs 



Proof: Combining Part 1) of Lemma 4.2 (36 1, and (37), we have 



Pr{M = 0} + Pv{K ^ L}+ Pr{M / M} < 8e 



for sufficiently large n. This implies that there must exist a C„ satisfying Pr{K ^ L\C = Cn} < 8e, 
Pr{M / M\C = Cn} < 8£, and Pr{M = 0|C = C„} < 8e. Thus, Parts 1) and 2) are proved. 
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Now, fix this Cn- For m = 1, 2, . . . , 2"^\ let y"'{m) be the mth codeword of C„. Then, by Part 3) of 

2" «(-Ri~7e) 



Lemma 4.1 



Pr{M = m\C = Cn} < Pr{5£(y", y"(m)) = 1} < 



1-e 



hence. Part 3) results. 

Note that, for I = 1,2,.. . ,2"^^ 

Pr{L = l\C = Cn} = Pr{L = l\M = 0,C = Cn} Pr{M = 0|C = C„} + Pr{L = l,M> 0|C = C„}. 

(38) 

We know from the discussion above that Pr{L = l\M = 0,C = Cn}Pr{M = 0|C = C„} < 2""^^ • 8e. 
Also from Part 3) of the lemma. 



Pr{L = /,M>0|C = C„}= Pr{M = m|C = C„} < 2"(^i- 



m6C„(0 



1-e 



1-e 



Putting these back into (38 1, we get 

Pr{L = l\C = Cn} < 2-"(^^-^^) 
for sufficiently large n. Thus, Part 4) is proved. 



8e • 2"^"" + 



1-e 



In the remainder of the paper, we use a fixed code C„ identified by Lemma 4.3 For convenience, we 
drop the conditioning on Cn- 

D. Secrecy Analysis 

First we proceed to bound H{K). Note that 

H{K) = H{L) + H{K\L) - H{L\K) 

> H{L) - H{L\K). (39) 



Using Part 1) of Lemma 4.3 together with Fano's inequality gives H{L\K) < 1 + SneR^. Moreover 



Part 4) of Lemma 4.3 implies that H{L) > n{R^ — 8e). Putting these bounds back into (39 1, we have 

(40) 



^3 - {8R3 + 8)e - - < -H{K) < i?3. 
n n 



Next we bound I{K; Z'^.J). Note that 

I{K;Z'',J) = I{L;Z'',J)+I{K;Z'',J\L)-I{L;Z'',J\K) 

< I{L;Z'',J)+I{K;Z'',J\L) 

< I{L-Z'',J) + H{K\L) 

< I{L; Z", J) + SneRs + 1 



(41) 
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where the last inequaUty is obtained from Part 1) of Lemma 4.3 and Fano's inequaUty Uke before. In 
addition, it holds that 

/(L;Z",J) = H{L)- H{L\Z'',J) 

= H{L)-H{L,J\Z'')+H{J\Z'') 

= H{L) + H{J\Z'') - H{L, J, M|Z") + i/(M|Z", L, J) 

< H{L) + H{J) - i7(M|Z") - H{L, J\M, Z") + i/(M|Z", L, J) 

< H{L) + if(J) + /(M; Z") - /7(M) + 8ni?ie + 1, 

where the second last inequality follows from H{J\Z'"') < H{J), and the last inequality follows from 
if(L, J|M, Z"") = (by definition of J and L) and H{M\Z'',L, J) < 1 + SniJie (by Fano's inequaUty 
applied to the fictitious receiver). By construction of the code Cn, it holds that H{L) < ni?2 and 



H{J) < nRs. In addition, Part 3) of Lemma |43] impUes H{M) > n{Ri - 8e). Finally, note that 
/(M; Z") < /(y"; Z^) = nI{Y; Z) by the data-processing inequality applied to the Markov chain 
y" and the memoryless property of the channel between y" and Z". Combining these 



observations and substituting the values of Ri, R2, and R^ given by (27 1 back into (41 1, we obtain 



-/(E:;Z",J) < R2 + R3-Ri+IiY;Z) + {8Ri + 8R3 + 8)e + - 
n n 

< I{Y- Z) - I{Y; Z) + (8i?i + 8R3 + 9)e, 

when n is sufficiently large. Without any rate limitation on the public channel, we can choose the transition 
probability p{y\y) such that I{Y; Z) — I{Y] Z) < e; therefore, 

1 



n 



-I{K; Z", J) < {8R1 + 8R3 + 9)e. 



(42) 



Since e > can be chosen arbitrarily. Part 1) of Lemma 4.3 (40 1, and (42 1, establish the achievability 
of the secret key rate I{Y; X) - I{Y; Z). 



V. Conclusion 

We evaluated the key capacity of the fast-fading MIMO wiretap channel. We found that spatial 
dimensionality provided by the use of multiple antennas at the source and destination can be employed 
to combat a channel-gain advantage of the eavesdropper over the destination. In particular if the source 
has more antennas than the eavesdropper, then the channel gain advantage of the eavesdropper can be 
completely overcome in the sense that the key capacity does not vanish when the eavesdropper channel 
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gain advantage becomes asymptotically large. This is the most interesting observation of this paper, as 
no eavesdropper CSI is needed at the source or destination to achieve the non-vanishing key capacity. 
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